Multi-timescale Pmscs for Music Audio Classification

نویسنده

  • Philippe Hamel
چکیده

Principal mel-spectrum components (PMSCs) [3] are computed several timescales in parallel [2]. For each timescale, the feature extraction involves four steps: discrete Fourier transform (DFT), mel-scaling, principal component analysis whitening (PCA) and temporal pooling. Firstly, for each timescale, we compute discrete Fourier transforms over a given time length. To compute PMSCs at different timescales, we simply use different time length for the DFT. In this system, we use a combination of 5 timescales: 46ms, 93ms, 186ms, 372ms and 743ms. We use the same frame step of 23ms for all timescales, meaning that there is more overlap for longer timescales. Secondly, we run the spectral amplitudes of the DFTs through a set of 200 mel-scaled triangular filters to obtain a set of spectral energy bands. We take the logarithm of the amplitude of those bands. Then, we compute the principal components of a random sub-sample of the dataset (roughly 60,000 frames). In order to obtain features with unitary variance, we multiply each component by the inverse square of its eigenvalue, a transformation known as PCA whitening. Here, the goal of the PCA is to diagonalize the covariance matrix, not to reduce dimensionality. Thus, we keep all the principal components (yielding 200 dimensions per timescale). The PCA whitened mel-scaled energy bands are referred to as PMSCs. In the next step, we apply temporal pooling, i.e. we compute statistics over the PMSCs over a given window length. We chose a time window of roughly 1.5 seconds. Within this time window, we compute the mean, the variance, the minimum and the maximum of each feature. Thus, for each timescale and for each 1.5 seconds window, we obtain 4 ∗ 200 = 800 features.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Building Musically-relevant Audio Features through Multiple Timescale Representations

Low-level aspects of music audio such as timbre, loudness and pitch, can be relatively well modelled by features extracted from short-time windows. Higher-level aspects such as melody, harmony, phrasing and rhythm, on the other hand, are salient only at larger timescales and require a better representation of time dynamics. For various music information retrieval tasks, one would benefit from m...

متن کامل

Mirex 2009 a Multi-feature-set Multi-classifier Ensemble Approach for Audio Music Classification

The approach of combining a multitude of audio features and also symbolic features (through transcription of audio to MIDI) for music classification proved useful, as shown previously. We extended the system submitted to MIREX 2008 by including temporal audio features, adding another audio analysis algorithm based on finding templates on music, enhancing the polyphonic audio to MIDI transcripti...

متن کامل

Multi-label classification of music by emotion

This work studies the task of automatic emotion detection in music. Music may evoke more than one different emotion at the same time. Single-label classification and regression cannot model this multiplicity. Therefore, this work focuses on multi-label classification approaches, where a piece of music may simultaneously belong to more than one class. Seven algorithms are experimentally compared...

متن کامل

Music classification by low-rank semantic mappings

A challenging open question in music classification is which music representation (i.e., audio features) and which machine learning algorithm is appropriate for a specific music classification task. To address this challenge, given a number of audio feature vectors for each training music recording that capture the different aspects of music (i.e., timbre, harmony, etc.), the goal is to find a ...

متن کامل

Prototyping a Vibrato-Aware Query-By-Humming (QBH) Music Information Retrieval System for Mobile Communication Devices: Case of Chromatic Harmonica

Background and Aim: The current research aims at prototyping query-by-humming music information retrieval systems for smart phones. Methods: This multi-method research follows simulation technique from mixed models of the operations research methodology, and the documentary research method, simultaneously. Two chromatic harmonica albums comprised the research population. To achieve the purpose ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012